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■ Abstract. Recent likelihood theory produces p- values that have remarkable accuracy 
CN , and wide applicability. The calculations use familiar tools such as maximum likelihood 

P5 I values (MLEs), observed information and parameter rescaling. The usual evaluation 

^ ^ ' of such p- values is by simulations, and such simulations do verify that the global dis- 

! tribution of the p- values is uniform(0, 1), to high accuracy in repeated sampling. The 

' derivation of the p-values, however, asserts a stronger statement, that they have a uni- 

form(0, 1) distribution conditionally, given identified precision information provided 
by the data. We take a simple regression example that involves exact precision infor- 
mation and use large sample techniques to extract highly accurate information as to 
the statistical position of the data point with respect to the parameter: specifically, we 
examine various p-values and Bayesian posterior survivor s-values for validity. With 
observed data we numerically evaluate the various p-values and s-values, and we also 
record the related general formulas. We then assess the numerical values for accuracy 
using Markov chain Monte Carlo (McMC) methods. We also propose some third-order 
likelihood-based procedures for obtaining means and variances of Bayesian posterior 
distributions, again followed by McMC assessment. Finally we propose some adap- 

■ tive McMC methods to improve the simulation acceptance rates. All these methods 
are based on asymptotic analysis that derives from the effect of additional data. And 
the methods use simple calculations based on familiar maximizing values and related 
informations. 

OO ! The example illustrates the general formulas and the ease of calculations, while the 

' McMC assessments demonstrate the numerical validity of the p-values as percentage 

position of a data point. The example, however, is very simple and transparent, and 
thus gives little indication that in a wide generality of models the formulas do accu- 
}2j ■ rately separate information for almost any parameter of interest, and then do give 

5^ I accurate p-value determinations from that information. As illustration an enigmatic 

problem in the literature is discussed and simulations are recorded; various examples 
in the literature are cited. 

Key words and phrases: Asymptotics, Bayesian posterior s-value, canonical parame- 
ter, default prior, higher order, likelihood, maximum likelihood departure, Metropolis- 
Hastings algorithm, p-value, regression example, third order. 
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1. INTRODUCTION 

We explore various large sample and likelihood 
methods for obtaining Bayesian and frequentist p- 
values from a regular statistical model and data. 
Numerical values are obtained for a simple example 
that indicates the ease with which the methods can 
be applied, given the typically available maximum 
likelihood and related calculations. The general for- 
mulas are presented and discussed. 

The example, ignoring the nonnormality of er- 
ror, is simple and transparent: one could plot the 
data, calculate means and standard deviations, do 
the bootstrap, or even record likelihood and get 
broadly about the same answer. But the large sam- 
ple techniques, more exactly data- accretion tech- 
niques, provide accurate separation of component 
parameter information, precisely summarize the 
available information, and give accurate determina- 
tions of corresponding p- values. 

In Section 2 we take a pragmatic approach and ob- 
tain p-values using simple departure measures and 
distributional approximations related to the Central 
Limit Theorem. Then in Section 3 we formally refer- 
ence the statistical model and obtain a p- value based 
on the signed likelihood root. In Section 4 we add 
a widely accepted default prior and obtain the pos- 
terior survivor value, the analogue of the frequen- 
tist p-value. For the example these require three- 
dimensional integration. 

But then in Section 5 we examine recently de- 
veloped likelihood-based approximations that have 
third-order accuracy; numerical values are obtained 
for the example, and the general formulas are dis- 
cussed. In Section 6 we discuss the corresponding 
frequentist third-order |3-values. Numerical values 
for the example are then presented together with 
various intermediate values that indicate how the 
calculations proceed. 

In Sections 7 and 8 we consider exact p-values for 
the preceding methods, as derived by Markov chain 
Monte Carlo. As part of this we note that a = 
4 X 10^ simulation in the particular context gives 
about two- figure accuracy for probabilities, about 
the same as the third-order approximation methods. 

In Section 9 we briefly discuss the role of precision 
information in the Bayesian and frequentist con- 
texts. Section 10 looks directly at Bayesian means 
and variances and how they can be approximated 
by recent likelihood-based methods. Again Markov 
chain Monte Carlo is used to evaluate the accuracy. 



Section 11 presents some intuitive thoughts on the 
Metropolis-Hastings step in Markov chain Monte 
Carlo and then proposes several asymptotic and 
adaptive modifications of the direct McMC approach; 
these are explored for p- values in Sections 12 and 13. 
A controversial example is examined in Section 14, 
and some concluding remarks are recorded in a dis- 
cussion Section 15. 

The Bayesian and frequentist methods give about 
the same answer for the example. In fact, for the par- 
ticular example they give theoretically the same an- 
swer, a consequence of the judicious choice of default 
prior for the Bayesian analysis. We do not address 
here the manner of making such judicious choices 
or how the choice typically needs to be targeted on 
the particular parameter of interest; this will be ad- 
dressed subsequently. 

2. A SIMPLE EXAMPLE: DEPARTURE OF 
DATA FROM PARAMETER VALUE 

Consider an example to illustrate the formulas 
coming from large sample or, more exactly, data- 
accretion techniques: a small data set involving a 
response y with possible linear dependence on a re- 
lated variable x: 



(1) 



X —3 


-2 


-1 


1 


2 


3 


y -2.68 


-4.02 


-2.91 0.22 


0.38 


-0.28 


0.03 



The response variability is taken to be thicker tailed 
than the normal, say the frequently suggested Stu- 
dent (7) distribution. Then with linear dependence 
and constant response variability, we have the model 

7 

f{y- 9) = J] h{a~' {y^-a- Pxi)}, 

i=l 

where h{z) = {r(4)/r(l/2)r(7/2) V7}(1 + z^/7)~'^ 
is the Student(7) density. Or using quantile func- 
tion form, we can write yi = a + (3xi + azi, where 
the latent errors Zi are independent Student(7). Now 
suppose we are interested in assessing the response 
dependence on x as given by the regression param- 
eter P, with particular interest in whether (3=1. 

As background we note that the response data 
were in fact generated from the given model with 
Student(7) error and then rounded to two decimal 
places; the parameter values used to generate the 
data were q = 0, (3 = 1 and a = 1. In passing we 
note that a is an error scaling and does not record 
directly the error standard deviation. Also there is 
no implied connection between the number of ob- 
servations n = 7 and the degrees of freedom for the 
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error density df = 7. The example has simple and 
transparent structure, and we use it to examine var- 
ious frequentist and Bayesian assessment methods; 
we then apply Markov chain Monte Carlo sampling 
to check the distributional validity of the resulting 
values. 

The example is simple and transparent and we 
could probably do as well by plotting or by least 
squares and standard deviations. The likelihood 
model theory, however, gives a precise separation of 
information concerning parameters of interest and 
an accurate determination of the p- value or percent- 
age position of data with respect to a value for the 
parameter of interest. As a more general illustra- 
tion, we later record simulation data for a challeng- 
ing example from the literature. We also cite various 
examples that have been examined in the literature. 

For the example considered, a pragmatic first step 
is to use least squares to separate out the general 
location and the linear dependence on the related 
variable x. The fitted values for a and (3 are 

a = -1.322857, 6 = 0.675000, 

with residual length s = (SSR)^/^ = 2.660046 ob- 
tained from the sum of squares of residuals. 

The parameter /3 records the indicated linear de- 
pendence of y on a;. To assess the value /9 = 1 in 
the presence of the data, we can reasonably exam- 
ine the raw departure b — f3 = —0.325000, and then 
standardize it by its estimated standard deviation to 
obtain a standardized measure of departure of data 
from parameter value, giving 



t 



(2) 



Ky)-(3 
S/V5V28' 

0.675000 - 1 
2.660046/\/5\/28 



-1.445634. 



To interpret this statistically we need information 
concerning the distribution of possible values for t 
in the context where the true value of /3 is 1. To 
this end, some reference to large sample theory sug- 
gests the use of the standard normal distribution 
function, say $(•)• The observed value of this dis- 
tribution function then gives an approximation to 
the percentage of possible values of t that would 
be less than the observed t^, in other words, to the 
percentage position of the data with respect to the 
hypothesized value /3 = 1; this is called the observed 



p-value. Using the standard normal then as an ap- 
proximation, we obtain the approximate p-value 

PN = $(t°) = ^>(- 1.445634) 
= 0.07414 = 7.414%, 



(3) 



which records the observed level of significance in 
an elemental form, as just the percentage position 
of the data point or the probability left of the data 
point, under the hypothesis. 

A simple modification hopefully to accommodate 
the estimation of error scaling is obtained by using 
the Student(5) distribution function, say H^{-), as a 
revised approximation method. We then obtain the 
approximate p- value 

ps = ^5(i°)=^5(-1.445634) 
= 0.10395 = 10.395%. 



(4) 



An alternative to the direct use of the large sample 
distribution theory is provided by the bootstrap ap- 
proach. Using the least squares values, we separate 
the data values into a location or fit component yi 
and a residual component yi — jji- 



X —3 


-2 


-1 





1 


2 


3 


y -3.3479 


-2.6729 


-1.9979 


-1.3229 


-0.6479 


0.0271 


0.7021 


y~y 0.6679 


-1.3471 


-0.9121 


1.5429 


1.0279 


-0.3071 


-0.6721 



For the bootstrap we randomly sample the residuals 
with equal probability and add them back to the lo- 
cation component, thus obtaining a bootstrap data 
set, from which we calculate the bootstrap t-statistic 
value 

b* - 1 



where b* and s* are the regression coefficient and 
residual length from the bootstrap sample. We re- 
peated this bootstrap sampling a convenient total 
of iV = 10,000 times, and the empirical distribu- 
tion function was evaluated at the observed = 
— 1.445634. This gave an observed bootstrap p- value: 

F(t°) = 0.1051 = 10.51%, 



(5) 



PBS 



where F{t) is the empirical distribution function of 
the bootstrap t* values. 

With the present small sample size n = 7 we can 
calculate the bootstrap p- value exactly, pexBSj by 
using equal probability for each of the 7^ = 823,543 
possible bootstrap samples. We then take 

PExBS = {proportion(r < t°) 

proportion(t* < t°)}/2, 



(6) 
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Table 1 

Simple frequentist p-values for the regression example in 
Section 2 



(9) 



Measure of 
departure 



Distributional 
approximation 



p- value 
/i = l /:) = 1.5 /:) = 2 



f-statistic Normal 0.07414 0.0^121 0.0*189 

t-statistic Student(5) 0.10395 0.0^722 0.0^100 

f-statistic Bootstrap = 10* 0.10750 0.0^790 0.0^800 

t-statistic Bootstrap (exact) TV = 7'' 0.10332 0.0^833 0.0^888 

SLR Normal 0.05774 0.0^148 0.0*830 

A p-value records the percentage position of the data relative 
to a possible true value /? for the parameter; /3 = 1 is in fact 
the true value underlying the data set; /3 = 1.5 and /3 = 2 are 
other values that might have been of interest. Multiple zeros 
are indicated by a superscript, thus 0.0^148 — 0.00148. 

called a mid-p-value, and obtain 

(7) PExBS = 0.1033231 = 10.332%; 

with our particular rounded value there were no 
values at the boundary point. These four approx- 
imate p-values make use of a pragmatic choice of 
departure measure combined with distributional in- 
formation provided by Central Limit Theorem-type 
approximations or by resampling from the nonpara- 
metric maximum likelihood distribution; and they 
provide us with four determinations of the observed 
percentage position of the data relative to the model 
with P = 1. Other initial departure measures could 
have been considered, as well as other distributional 
approximations or determinations. 

These p-values for testing (3 = 1 are recorded in 
Table 1 together with a likelihood-based method dis- 
cussed in the next section. The table also records 
p-values for testing the more extreme /? values, 1.5 
and 2, with corresponding observed values = 
-3.669685 and t° = -5.893737. 

3. THE EXAMPLE: SIMPLE LIKELIHOOD 
DEPARTURE MEASURE 

More intrinsic departure measures are available 
from long available likelihood theory. The likelihood 
function in the present Student regression context is 



(8) 



L{a,f3,a;y) 



i=l 



7(72 



2 >, -4 



with log-likelihood 

^{a,(3,a■,y) 



a — 7 log a 



7 



4^1og 1 + 



i=l 



(yj-a- I3xif 

7(j2 



the observed likelihood L'^(a, /3, a) and observed log- 
likelihood £''(a,/3,cr) are obtained by substituting 
the data y^ = {yi,...,y^) from the data array (1) 
into the above expressions. 

In a general context the observed likelihood func- 
tion is given as 

L\e) = my') = cf{y';e) = f{e), 

which is the observed density function, that is, the 
statistical model f{y;9) examined at the observed 
data point y^; it records the amount of probability 
sitting at that data point, viewed as a function of 
possible values for the parameter. The constant c is 
taken as arbitrary but positive and indicates that 
only relative values from one 9 value to another are 
of relevance given the data point. If we consider in 
general how likelihood depends on data we can write 

L{e;y)=cf{y-e), 

£{e;y)=a + logf{y;e), 

where c > and a, c are otherwise arbitrary for each 
choice of data point y. 

Suppose now that we are interested in a scalar 
component tp = ip{0). Most likelihood methods make 
use of maximum likelihood values (MLEs); we do, 
however, avoid referring to them as estimates, as 
they are useful but typically not directly as esti- 
mates. We write 9 = argsupL(^) for the value that 
maximizes L{9). Also if tl^{0) is a component 
parameter of particular interest, we write 9^ = 
argsup^(g)=^ L{9) for the value that maximizes L{9) 
subject to the interest parameter iIj{9) having some 
value 'iij{9) = tjj oi special interest. Then based on 
the likelihood L[9) alone, an important departure 
measure of data from '(/'(^) = is obtained as the 
signed likelihood root (SLR) 



(10) = sign(V' - ^) [2{m - ^(0^,)}] 



1/2. 



this measure examines how probability at the data 
point under the full model exceeds that when 'il){9) 
is restricted to the value and theory has shown 
it to be fundamental. One way of viewing this mea- 
sure is to picture how much the log-likelihood rises 
from the maximum when V'(^) = V' up to the over- 
all maximum when 9 is unrestricted, that is. 
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Fig. 1. For the example in Section 2: the third-order Bayesian survivor function, the third- order frequentist p -value and the 
first-order SLR p-value. 



l{9^), often written £ — i. We take this rise as hav- 
ing quadratic form in terms of some quantity 
r^; solve for r^; and then attach the appropriate 
sign. For testing a true value tp{6) = the signed 
likelihood root r^, has to first order a standard nor- 
mal limiting distribution, a follow-through from the 
Central Limit Theorem. Related measures could be 
based on the slope of the log-likelihood at the tested 
value or on the displacement from 6^ to 6, but nei- 
ther has the same mathematical invariance or the 
same track record in applications. 

For many likelihood calculations, particularly re- 
cent higher-order calculations, the computationally 
challenging aspects often arise in the maximization 
steps rather than in other steps. 

For our simple regression example and the related 
likelihood calculations we now use maximum like- 
lihood rather than least squares variables; the ob- 
served values obtained by computer iteration are 

a = -1.3504512, 

P = 0.6504019, 

(T = 0.9641110 

for the overall MLEs, and 



d = 


0-13=1 


= -1.366699 


P = 


k=i 


= 1, 


a = 


<5"/3=l 


= 1.154527 



for the constrained MLEs when [3=1. In passing 
we note that the use of MLE variables can be con- 



venient but can be awkward when the error distri- 
bution itself has dependence on the parameter; for if 
the distribution for the error itself has dependence 
on the parameter rather than as here being just Stu- 
dent (7), then the maximum likelihood value could 
also have that parameter dependence and thus not 
be a statistic. From the preceding numerical values 
we obtain from (10) the signed likelihood root 

(11) r^=i = -1.574053. 

The corresponding observed p-value based on the 
first-order normal approximation for r is then 

(12) psLR = ^(r/3=i) = 5.774%; 

this is also recorded in Table 1. In Figure 1 we 
plot the SLR p-value against a full range of pos- 
sible (3 values; this is called the p-value function; 
some related determinations discussed in later sec- 
tions are also recorded in the figure. Other likelihood 
departure measures based on the score and maxi- 
mum likelihood estimates are sometimes considered, 
but they frequently have serious distributional and 
measurement bias difficulties. By contrast, the SLR- 
based approximate p-value uses a departure mea- 
sure that directly relates to the statistical model; 
it summarizes background information contained in 
the model combined with distributional information 
derived from the large sample behavior of the like- 
lihood function. 
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4. ANALYSIS WITH DEFAULT PRIOR 

An alternative first-order likelihood-based method 
comes from the use of a model-based flat prior called 
a default prior together with conditional-probability 
type calculations, typically referred to as Bayes al- 
gorithm. Let tt{6) be a weight function for 6 based 
on symmetries, invariance, or other relevant proper- 
ties of the model. The corresponding posterior dis- 
tribution viewed as providing inference information 
concerning 9 is 

(13) 7T{e\y^) = c7rie)Lie), 

where c now indicates the norming constant. This 
default approach was implicit in Bayes (1763), was 
strongly promoted by Laplace (1812) and Jeffreys 
(1946) and acquired the name inverse probability. 
More recently its development has been stimulated 
by the Valencia conferences (see, e.g., Bayesian Statis- 
tics 7, 2003); the default priors are called objective 
priors but in such contexts the objective can only 
refer to model structure, not to any objective fre- 
quencies in the context being examined. 

For inference concerning an interest parameter, 
say ip{0), one might then reasonably calculate the 
marginal posterior density function 

7r(^|yO)= 1 7Tie\y')dX, 

where A is a complementing nuisance parameter here 
chosen so that 9 is one-one equivalent to (^, A) and 
to have, say, Jacobian \d9/dip dX\ = 1 so that sup- 
port volume corrections can be ignored. 

This marginalization to obtain an inference dis- 
tribution for a component parameter can produce 
misleading results (e.g., Dawid, Stone and Zidek, 
1973). To overcome this issue, a preferred approach 
is to have a prior that depends on the particular 
parameter of interest, and thus to use a targeted 
prior 'K^{9) where the subscript indicates the partic- 
ular parameter being targeted (e.g., Jeffreys, 1946; 
Bernardo, 1979; Fraser et al., 2003). We do not ad- 
dress this important issue of choosing default priors 
but do acknowledge that it is of major interest for 
the Bayesian community at the present time and in 
part for the frequentist community. 

For our simple regression example we might pos- 
sibly consider the model-based default prior 7r(^) to 
be the invariant prior 

//,\ in da dp da da djS d\oga 
Tr{9) d9 = 5 = K ; 



this derives from parameter transformations on the 
sample and parameter spaces and is referred to as 
the left invariant prior (e.g., Jeffreys, 1946; Fraser, 
1979): under transformations that make location and 
scale changes on the initial sample space, the dif- 
ferential rewritten as, say, da d(3 da/a'^ remains 
unchanged, is invariant. This left prior avoids the 
marginalization issues for certain parameter com- 
ponents that are linear in a location parameteriza- 
tion implied by asymptotic theory (Fraser and Reid, 
2002); for many familiar parameters of interest, how- 
ever, it can lead to the marginalization issues; and 
furthermore it does not correspond to the confidence 
theory pivotal inversion based on the usual equa- 
tions {yi — a — I3xi)/a = Zi. 

For the parameter /3, various approaches (e.g., 
Dawid, Stone and Zidek, 1973; Fraser, 1979) sug- 
gest the targeted prior 

7r^(^) d9 = = da d(5 dlogcr, 

and for many parameters having a certain linearity 
it does avoid the marginalization issue; some discus- 
sion of this linearity and a related curvature measure 
may be found in Fraser and Reid (2002); the curva- 
ture issue does not arise in the present problem for 
the nice parameters a, a and we do not pursue the 
issue here. From the transformation viewpoint this 
is called the right invariant prior. The corresponding 
model-based posterior for 9 is then given by 

7r(a,/3,r;yO)d0 
(14) = ccj-^ 

• n|l + ^^^^^^^^^r'dad/^dlogcr; 

i=l 

for this, if we now take 9 to be (a, r) = (a, /3, logo"), 
the implied prior is ■k{9) = 1 and it conforms to con- 
fidence inversion. 

In order then to obtain the marginal density for 
the interest parameter /3, an integration over a and 
T is required. Repeated numerical integration over 
two dimensions can be quite feasible but often is not 
easily implemented; we next consider some alterna- 
tive integration procedures. 

For more general use of this Bayesian approach 
the choice of the default prior becomes a crucial is- 
sue and is the focus of much present activity in the 
Bayesian community; the term objective Bayesian 
prior is sometimes used in place of the term default 
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Bayesian prior, but this is misleading as the objec- 
tive would indicate that it is describing the physical 
context rather than being based as here on model 
characteristics, a level removed from the physical 
context. 

5. THIRD ORDER WITH DEFAULT PRIOR 

For many regular densities, posterior or otherwise, 
the Laplace integration method provides an accu- 
rate alternative route for obtaining the marginal 
density of a component, such as (3 in our case. As 
before, but in general notation, we use 7r('0, A) for 
the proposed prior and -^("05 A) for the likelihood. 
Then with third-order accuracy, the marginal pos- 
terior density for ■0 can be obtained by Laplace in- 
tegration over the nuisance parameter divided by 
Laplace integration over the full parameter, giving 



7r(V';y° 



(15) 



(27r)rf/2 



\jee\ y/'TTje^) 



\jxx{o^)\) m 



where k indicates a first-order constant: is the 
likelihood ratio quantity 

(16) rl=2{m-l{e^)} 

discussed earlier but now used more generally with 
ip of dimension d, nuisance parameter A of dimension 
?Ti, and p = d + m; the Hessian matrices 



0989' 
92 



dXdX' 

are information functions for the full and for the 
nuisance parameters, and have dimensions pxp and 
mxm; they are just negative second-derivative ma- 
trices of the log-likelihood function, and when eval- 
uated at 9 and 9^ give the observed information 
matrices 



(17) 



je9 = jee{0;y), 



Numerically, the information matrices can typically 
be computed by taking second differences based on 
very small and equally spaced values for each coor- 
dinate. 



The preceding marginal density can also be writ- 
ten 



7r(0;?/' 



(18) 



where 



(27r)'^/2 Lp(0) 



Lp (ip) = sup L(-i/), A) = ^(0, A^) 

A 

is the profile likelihood function for ip, obtained by 
maximizing the full likelihood over A for fixed value 
lp of the interest parameter ip{9). 

The methods inherent in the Laplace integration 
procedure can be described fairly easily. A regular 
function, here L{^,X) for fixed ip, whose logarithm 
has additive and maximum likelihood value proper- 
ties under increasing sample size n can be rewritten 

/(A)=e'^W 

= cexp{-jAAAV2} exp{aAV6n^/2 + 6AV24n} 

to third order as a function of A, with obvious gen- 
eralization for vector A; for this we are letting A 
designate a standardized departure of the original 
A from the maximizing value for /(A) with ip fixed. 
After expanding the second exponential in a power 
series and similarly for the log-prior, and then inte- 
grating term by term with respect to A, we obtain 
to fourth order, 

/(A)7r(0,A)dA 

= e^>(27r)-/2|j,;,(0)ri/2/(A;)^(0,A^), 

where j is the negative Hessian with respect to A as 
evaluated at the maximum for the fixed tp. The in- 
tegrations are based on simple reference to the mul- 
tivariate normal integral; for some background, see 
Strawderman (2000) and for some discussion of term 
by term integration, see Andrews, Fraser and Wong 
(2005). 

For a scalar interest parameter ip we can reason- 
ably be more interested in an integral of its den- 
sity function, and particularly in the right tail in- 
tegral called the posterior survivor function. Why 
the right tail? Consider the simple case of a vari- 
able X measuring a parameter ip with error density 
/(e) and distribution function F{e); we have: the 
observed p- value is p^{ip) = F^[ip) = F{x^ — ip); the 
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right tail posterior survivor function with a natu- 
ral flat prior is s{tp) = f{x^;a)da = f{x^ — 
a) da = F{x^ — ip); and these are equal. In more 
general model situations the ^3- value as discussed in 
the next section records in a statistical sense where 
the data value is with respect to ■0 in a left-to-right 
distributional sense and then corresponds as in the 
simple case to the survivor posterior value s{^l^). 

For the general case a highly accurate approxima- 
tion to the posterior survivor function is available 
from likelihood theory (see, e.g., Fraser, Reid and 
Wu, 1999, generalizing DiCiccio and Martin, 1991): 

s(0) = 1 - G(V^|y°) 

<I> ( r - log - ) = $(r*). 



For this, G designates the posterior distribution func- 
tion for ip, r is the signed likelihood root given 
as (10) in Section 3, qb is a score-type departure 
measure for ip, 



(20) 



where 



(21) 



is a score departure measure, and r* is implicitly 
defined. Note that for convenience we have taken 
the full parameter 9 to be given as {tp^ A')' in terms 
of the components, and this applies as well to the 
information matrices at (17). 

Also we have chosen to record the upper tail for 
presenting posterior probability from the posterior 
distribution; the interest parameter will often have 
physical meaning in a particular application and in- 
vestigators will think in terms of a variable mea- 
suring the parameter as, say, with a maximum like- 
lihood estimate. In such a framework the usual p- 
value is left tail for the variable and correspondingly 
is right tail for the parameter as in the location case; 
accordingly for harmony between the two inference 
approaches we take the reference Bayesian probabil- 
ity to be the upper tail survivor value. In the next 
section we will find that formula (20) using r from 
(10) and qb from (20) can be derived directly from 
frequentist formulas in the next section by acting 
as if vr(?/^; y") in (15) were obtained from a location 
model 'k{iJj — x; y^) with a nominal variable x taking 
the observed value x = 0. 



For our simple example and testing /? = 1, we have 
r = —1.574053 from (11) and we have 



(22) 
where 



= -0.9483686, 



£f3{§f^=i) = -5.868699, 



and the full and constrained information matrices 
for = (a, P, r) are 



(23) 



jee=jee{0) 



/ 5.7894195 1.311060 -0.3286837 \ 
911 1.311060 27.288552 -1.3395559 . 
V -0.3286837 -1.339556 12.1900132 / 



(24) hxiOp) = 911 



4.0240689 
-0.3319537 



-0.3319537 \ 
8.5925687 )' 



with corresponding determinants 



(25) 



|j9e| = 1892.702, 
bAA(^/3)| = 34.4669. 



This gives r* = —1.252169; the Bayesian posterior 
survivor value from (18) is then 



(26) 



sb(1) =0.1052542; 



this is recorded in Table 2 together with the Bayesian 
survivor values for testing /3 = 1.5 and (3 = 2, as well 
as some McMC validation results discussed in later 
sections. 

The first-order likelihood method in Section 3 re- 
quires the full and the constrained maximum like- 
lihood values and 9^ with of course correspond- 
ing values for the log-likelihood function. In order 
to take advantage of the approximate integration 
formulas in this section, we require in addition the 
second-derivative values at each MLE; such deriva- 
tives are of course also needed for familiar score and 
MLE departure measures and typically can be ob- 
tained by differencing. 

We can also calculate the Bayesian survivor value 
s(/3) for a range of values for (3. For our special ex- 
ample the Bayesian survivor function sb(/3) is plot- 
ted in Figure 1 together with the likelihood ratio 
p- value ^{r/s) and a third-order frequentist p- value 
to be discussed in the next section. 
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6. THIRD-ORDER p-VALUE 

Recent likelihood methods give highly accurate 
approximations for frequentist inference in much the 
same manner as for the Bayesian context just de- 
scribed. The methods require full and constrained 
maximum likelihood values 9 and 9^, as well as 
full and constrained information determinants. They 
also, however, need something more in the way of 
information from the model and data. The nature 
of this extra information can best be described in 
terms of parameterization scaling or reexpression. 
In particular, we need to express the initial parame- 
ter as a canonical-type parameter, say ip = (p{9). In 
the Bayesian context, such additional information is 
closely related to the development of an appropriate 
default prior; thus the use of a weighted likelihood 
L{9)'k{9) can partly be interpreted in terms of seek- 
ing a location reparameterization (3 = (3{9) such that 
7r(6) d9 = dp. Bayesian parameter reweightings have 
long been sought in the developmental sequence from 
invariant (Bayes, 1763; Laplace, 1812) to Jeffreys 
(1946) to reference priors (Bernardo, 1979). 

In the frequentist context the accessible reparam- 
eterizations are of an exponential rather than a loca- 
tion type, but they give some access to the related lo- 
cation information. The exponential-type reparam- 
eterization can be examined initially in the context 
of an exponential model. To this effect, consider an 
exponential model with canonical parameter ip, 

(27) f{y; (p) = exp{ip's{y) - K{ip)}h{y), 

where ip and s have the same dimension, say p. 
For such a model, the saddlepoint approximation 
(Daniels, 1954) can be remarkably accurate, and has 



the density form 
(28) f{y;ip)dy-. 

(29) 



(2^)P/2 
k/n 



2/2 



11 



1 1/2 



dip 



2/2 



\jipip\ ^ ds, 



(27r)P/2 

where is the likelihood ratio quantity for assessing 
the full parameter (p and j^^ = —{d"^ /dip'^)i{ip;y^)\^ 
is the observed information matrix from a data point 
y. The renormalized (28) is third-order accurate. 

For the important special case of a scalar parame- 
ter ip, a corresponding distribution function approxi- 
mation was developed by Lugannani and 
Rice (1980) and in an alternative form by Barndorff- 
Nielsen (1991). Both versions use the signed likeli- 
hood ratio r = r{ip,s) corresponding to (10), plus a 
maximum likelihood value departure q = q{ip, s): 

qf = {(p-ip)iy^. 

The approximate distribution function in the 
Barndorff-Nielsen (1991) form for or s is then 



(30) F{s;^) = ^ 



r — r ^\og — ) = <I>(r*), 
qfj 

and has third-order accuracy; this has the same form 
as (19) but uses a different departure q appropriate 
to the present frequentist context. The similarity of 
the Bayesian formula (19) and the above frequen- 
tist formula can appear more plausible by defining 
the following reexpressions of the variable and the 
parameter: 



b{s) 



jipip 



V^ds. 



Table 2 

For the simple regression example in Section 2, Bayesian s-values for assessing the values 13 =1, 
P = 1.5 and /3 = 2; using the third-order formula (19); using the McMC (Section 8 with Normal 
proposal); using AMcMC (Section 12 with adaptive choice of Student proposal) 



Test procedure 




s-value 






(3 = 1 


(3 = 1.5 


(3 = 2 


Bayesian: third order (A^ = 4 x lO") 


0.10525 


0.0^725 


0.0^923 


Bayesian: McMC (A = 4 x 10®, Normal) 


0.10744 


0.0^841 


0.0^118 


(Simulation SD) 


(0.0^484) 


(0.0^186) 


(0.0*789) 


{Acceptance rate} 


{41.9%} 


{41.9%} 


{41.9%} 


Bayesian: AMcMC (A = 4.10*', adaptive Student) 


0.10752 


0.0^836 


0.0^118 


(Simulation SD) 


(0.0^332) 


(0.0^100) 


(0.0''366) 


{Acceptance rate} 


{51.1%} 


{51.1%} 


{51.1%} 
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In terms of these, Welch and Peers (1963) showed in 
effect that b has a location model f{h — (3) to second- 
order accuracy. This has profound implications for 
possible second order agreement between Bayesian 
and frequentist methodologies, but Welch and Peers 
(1963) presented their results in terms of the fre- 
quentist approach of obtaining confidence bounds 
by integrating likelihood with respect to the Jeffreys 

1/2 

(1946) prior iip^p [ip)dip. The same result with some 
greater generality can be obtained by Taylor series 
expansion of an asymptotic model with scalar vari- 
able and parameter; simple reexpressions of variable 
and parameter show that to second order the model 
can be written either as a location or as an exponen- 
tial model; see, for example, Cakmak et al. (1998) 
and Andrews, Eraser and Wong (2005). A somewhat 
similar result is available with vector variable and 
parameter; see Cakmak, Fraser and Reid (1994). 

For the case of a location model f{s — (3) with a 
flat prior vr(/3) = c, the expression (20) for sim- 
plifies to 



-1/2 



As if3iP;s) = {d/dp)£{P;s) = -{d/ds)£{s - /3) = 
— ^;s(/9;s) from the location property, and —lp{l3) = 
i-s{fi,s) = (f from the exponential form (27), we ob- 
tain through simple algebra that gi = i which im- 
plies that the frequentist distribution function is 
equal to the Bayesian survivor function, as would 
be expected. In other words, this Welch and Peers 
(1963) Bayesian-frequentist equality is obtained from 
the Jeffreys prior for the scalar exponential model, 
and thus as demonstrated by Cakmak et al. (1994, 
1998) has an extension for general asymptotic mod- 
els. The advantages of this scalar parameter use of 
the Jeffreys prior has recently been discussed for 
the discrete binomial distribution context by Brown, 
Cai and DasGupta (2001). 

Now consider the vector exponential model (27) 
with p-dimensional canonical parameter 93 and p- 
dimensional canonical variable s, and suppose that 
we are interested in a scalar component parameter 
V'(v') having reasonable smoothness properties. 

The signed likelihood root r = in (10) is the pri- 
mary departure measure and a maximum likelihood 
departure 



(31) 



qi{ip) = sign(V; - V') 



lx-Xi/>l 



1/2 



is the secondary departure measure. For this, = 
(A, if)) has been presented as a combination of if) with 
a nuisance parameter A which complements the in- 
terest parameter tp] in the vector case we should 
perhaps write the combination in term of row vec- 
tors as 9' = (^', A'). The scalar parameter x{^) is a 
rotated coordinate of ip{9) that acts as a surrogate 
for 'ip{9) and has linearity in terms of '^{9). Explicit 
formulas for the surrogate parameter x{&) and the 
nuisance information are recorded in the Appendix; 
some discussion also appears in the next section; the 
parentheses around (AA) are to indicate that the in- 
formation has been recalibrated in terms of the new 
parameterization ip. All of this is easily accessible 
numerically, and uses primarily just the typical in- 
gredients of the Bayesian-type approximation. 

7. THIRD ORDER FOR THE EXAMPLE 

We have just described how third-order |3-values 
are available to assess a scalar parameter ipij)) in an 
exponential model. While exponential models are of 
course quite important, they do represent a very spe- 
cialized type of model. However, recent likelihood 
theory has shown that for a general continuous sta- 
tistical model together with data, there exists a cor- 
responding exponential model that provides highly 
accurate third-order p- values for the original model 
and data, using the formulas in the preceding sec- 
tion. 

For our example in Section 3, the corresponding 
exponential model with data has the same observed 
log-likelihood i{9) = log f{y^; 9) given as (9) and has 
a nominal reparameterization if' = (ipi,ip2, ^3) given 
as a row vector; some discussion is given later. See 
also Davison, Fraser and Reid (2006). The reparam- 
eterization is 



(32) 



if' {a, (3, a) 



7 



{a + l3x^-yf)/7a^ 
'^l + (yO-a-/3rE,)V7a- 



"(1) ^ii '^i 



and is explained later in detail; here dP is just the 
standardized residual vector (y^ — y^)/a^ recorded 
numerically preceding (35). The corresponding gen- 
eral formulas are recorded at the end of this section. 

To obtain the p- value for assessing any scalar com- 
ponent parameter, it suffices to treat the observed 
likelihood as a function of ip, which of course means 
explicitly or implicitly that the observed informa- 
tions needs to be reexpressed or recalibrated in terms 
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Table 3 

Frequentist p-values for the simple regression example in Section 2, for assessing the values 
P = 1, /3 = 1.5 and P — 2: using the third-order formula (30) with (11) and (31 ); using McMC 
with normal proposal centered at a value at hand with standard deviation 0.35 (Section 8); 
using McMC with a centered Student(7) proposal; and using McMC 
using an adaptive Student proposal (Section 13) 



Test procedure 




p- value 






= 1 


= l.b 


= 2 


Frequentist; third order 


0.10525 


0.0^725 


0.0^923 


Frequentist; McMC (Af = 4.10^, Normal) 


0.10832 


0.0^819 


0.0^118 


(Simulation SD) 


(0.0=^398) 


(0.0^109) 


(0.0*406) 


{Acceptance rate} 


{38.0%} 


{38.0%} 


{38.0%} 


McMC [iV = 4.10^ Student(7)] 


0.10765 


0.0^827 


0.0^113 


(Simulation SD) 


(0.0=^196) 


(0.0*510) 


(0.0*185) 


{Acceptance rate} 


{75.9%} 


{75.9%} 


{75.9%} 


AMcMC (^' = 4.10^ adaptive Student) 


0.10792 


0.0^823 


0.0^109 


(Simulation SD) 


(0.0^204) 


(0.0''646) 


(0.0*264) 


{Acceptance rate} 


{81.6%} 


{81.6%} 


{81.65%} 



of the ip parameterization and the maximum hkeh- 
hood departure needs also to be expressed in the f 
parameterization. 

The recalibration of the information is obtained 
from the derivative ^pe{(^) = dip/dad(3 da of ip with 
respect to the initial parameters as evaluated at the 
maximum likelihood values. For this with ip = 
X' = (a, logo"), we obtain 



MO' 



5.7894195 
1.3110600 
-0.3286837 



1.311060 
27.288552 
-1.339556 



-0.3286837\ 
-1.3395559 , 
12.1900132 / 



4.0240689 -0.3319537' 
-0.1474505 -7.5706463 
-0.5188019 7.5500100 



Using these as scaling matrices along with the ma- 
trices (24) gives the recalibrated information deter- 
minants: 

Uwl = bee I be r' = 0.000528345, 

li(AA)(^/3=l)l = |jAA(^/3=l)||'/'>Ar' =0.01844021. 

The special maximum likelihood departure used in 
(31) is sgn(/3-/3)|x-X/3| = -5.602751. We then use 
r^=i = —1.574053 from (11) together with 



(33) 



9/3=1 



-0.9483686 



from (31) to substitute in (30); this gives the third- 
order p- value 



(34) 



P3rd = 0.10525, 



which is recorded in Table 3 along with other values 
including those for testing the parameter values f3 = 
1.5 and f3 = 2. 

We can also calculate the third-order frequentist 
p-value p{P) for a range of values for /?; for our ex- 
ample, this p-value function is plotted in Figure 1 
using dots to allow for comparison with the Bayesian 
s(/3) obtained in Section 4. 

We record now some general thoughts on the repa- 
rameterization ip{6). In the context with indepen- 
dent scalar coordinates, we have 



=1 



dyi 



dm 

,0 de 



The first factor is a function of 9 that records how 
the ith coordinate influences the likelihood function: 

— ^ = ^^ogfi{yi;ey, 

oyi oyi 

it is the coordinate gradient of likelihood and can 
be viewed as a parameter when the observed data 
values are substituted. For our example we have 



dmvi) 



dyi 



(a + /3rr,-j/f)/7a^ 
\ + {y^^-a-l3x^Y/7a^ 



and it appears in (32) above. The second factor in 
(32) is a numerical row vector that records how pa- 
rameter change near the overall maximum likelihood 
value affects the ith coordinate; it records the sensi- 
tivity of the ith. coordinate to parameter change at 
the maximum likelihood value. This uses the error 
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Zi as an zth coordinate pivot, which with continuity 
is necessarily one-one equivalent to the distribution 
function Fi{yi;6); and then with the pivot inverted 
to express yi = yi{9, Zi) in terms of 9 and Zi we ob- 
tain the derivative 

dyi 



for the ith. coordinate at the data point. For our 
example with Zi = {yi — a — I3xi)/a we obtain 



and then have 



yi = a + (3xi + az^ 



dyi 
da 
dyi 

d[j 

dyi 

da 



Xi 



Zi. 



Then for evaluation at [y^, 9^) we need the observed 
standardized residual z^ = (f = d{y^): 

d° = {cii(y°),...,(i7(y°)}' 

= (0.5614092,-1.1324253, 

-0.7667588, 1.2969452, 

0.8640297, -0.2581882, -0.5650118)' 

as calculated from 



(35) 



_ y^ 



fi^x. 



We thus obtain the final vector in {l^Xi,d^) as used 
in (32). 

For some background theory see Fraser and Reid 
(1993, 1995, 2001) and for an overview of the method- 
ology for the regression context see Fraser, Reid and 
Wong (2005), Fraser, Wong and Wu (1999). 

8. THE EXACT p-VALUES AND s-VALUES 

Higher-order p-values and higher-order posterior 
survivor s-values are usually validated by simula- 
tions, by verifying that the large sample distribution 
in each case is close to the uniform(0, 1) distribution. 
The formulas, however, have been developed in the 
context of a conditional model: in the Bayesian con- 
text it is conditional on the full data, and in the fre- 
quentist context it is conditional on data indicators 
of statistical precision, which typically are given by 



exact or approximate ancillaries that reflect struc- 
ture and continuity of the model with respect to the 
variable and the parameter. 

For a general frequentist context, the approximate 
ancillaries for third-order inference are well defined 
theoretically but are only needed near the observed 
data point and typically are only available near the 
observed data. In such contexts this can make a 
full conditional validation unattainable as typically 
there is no accessible information concerning other 
conditioned points, beyond the tangent direction at 
the data. Our example, however, has special lin- 
ear and transformation properties that do provide 
an exact conditioning variable, here d{y)^ and thus 
an exact conditional distribution; some background 
and details are recorded in the Appendix at point 
(i). Again as in Section 2 we describe the model 
in terms of the convenient least squares coordinates 
(a, 6, s), and then record the density for just the 
standardized or null case with a = 0, /3 = 0, cr=l; 
the conditional density for (a, 6, s) is 

7 

(36) g{a,h^s) = c'^h{a + hxi + sdi^s'^, 
1=1 

where as before h{z) is the Student (7) density; this 
is derived in Fraser (1979) and discussed briefly in 
Fraser (2004). Note that we could also have used 
maximum likelihood variables, as the error distribu- 
tion is free of the parameter, but the least squares 
variables have convenient simplicity; the nonnull con- 
ditional density is then available directly as 

a-^g{a-\a-a),a-^{b- p),a-^s}. 

The null and nonnull distributions can also be ex- 
pressed directly in terms of the observed likelihood 
function L^{a, /3,a), by simple change of argument; 
for details, see the Appendix at point (i). 

More generally, for a regression model y = XP + 
az where z has error density f{z) = YVi=ig{zi) and 
X is n X r with full column rank, the conditional null 
density for the least squares (6, s) given the observed 
value of the residual vector d = {y — Xb) is 



(37) 



c/(X6 + sd°)s"- 



which is the original density reexpressed in terms of 
the new variables coupled with a Jacobian scaling 
factor with power equal to the effective number of 
coordinates that are conditioned. For sample simula- 
tions, next to be discussed, we will, however, switch 
from s to log s to obtain an unbounded range, with 
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corresponding effect on the density expressions (36) 
and (37). 

To assess the accuracy of the third-order Bayesian 
values from Section 5 or the third order frequentist 
p- values from Section 6, we will use large-scale com- 
puter simulations from the posterior density (15) or 
from the precision-based conditioned density (36) 
or (37). While for our example with three coordi- 
nates, numerical integration would be feasible, we 
choose the more flexible and generally available 
Markov chain Monte Carlo (McMC) sampling pro- 
cedure. 

For the McMC sampling we will refer to the dis- 
tribution to be sampled as the target distribution 
or target density and use the notation g{y); in our 
case and frequently in general such target distribu- 
tions come to us unnormed, that is, we do not know 
the value of the integral / g{y)dy. We describe a 
procedure for successively obtaining sample values 
yi,y2, — In particular, with a given sample value 
yi in hand, we sample from a normal distribution 
located at that sample value with coordinate stan- 
dard deviation, here say 0.35, to obtain a possible 
next sample value; this normal distribution is called 
a Gaussian proposal. We then use a ratio of likeli- 
hood at the possible new value to likelihood at the 
value in hand to decide whether the next value yi+i 
is to be the just-obtained trial sample value or is to 
be a repeat of the value in hand; the likelihood ratio 
is called the Metropolis-Hastings criterion. We are 
thus using a random walk Metropolis (RWM) algo- 
rithm with a Gaussian proposal distribution to gen- 
erate a sequence [y^, . . . , y^) of points where y here 
refers to the variable in the target distribution being 
sampled, that is, the posterior (15) or the modified 
version of the three-dimensional distribution (36) or 
(37). The limiting distribution of the sequence ap- 
proximates the distribution of the target but does 
have serial correlation that complicates the estima- 
tion of the effective simulation sampling variance. 
An alternative and convenient proposal distribution 
is the uniform proposal, a uniform distribution cen- 
tered again at the value in hand with range here, say, 
1.50. For a recent overview see Robert and Casella 
(2004). 

To estimate the true p-value based on the ob- 
served t-departure from our original data set, we 
then check for each sample point whether t in (2) 
calculated from a simulated y is less than the ob- 
served —1.445634; the simulated exact p-value is ob- 
tained as the proportion satisfying the inequality. 



We will also report the estimated simulation stan- 
dard deviation; results are recorded in Table 3 to- 
gether with the third-order p-value from Section 7 
and some other values. The simulation size was A'" = 
4,000,000; in the sample sequence we would dump 50 
values, then retain 950 values and repeat this pat- 
tern. From this sampling pattern we were able to 
obtain an estimate of the simulation standard devi- 
ation, using the 4000 repeats of sample means from 
batches of 950; for some details, see the Appendix at 
point (ii). The table also records p- values for testing 
(3 = 1.5 and 13 = 2.0 using corresponding observed 
values tO = -3.669685 and t° = -5.893737. 

9. PRECISION INFORMATION AND 
BAYESIAN-FREQUENTIST AGREEMENT 

With continuous parameters and theory based 
nominally on increasing amounts of data we have 
noted that p-values for scalar parameters are avail- 
able with third-order accuracy. Similarly, upper tail 
posterior values or s- values for scalar parameters are 
also available with third-order accuracy assuming of 
course the acceptability of the prior. In the default 
prior community, it seems acknowledged that the 
choice of sensible prior needs to be based on the pa- 
rameter of interest, in other words, targeted on the 
parameter of interest; the development of targeted 
default priors will be examined separately. 

For the frequentist approach we have noted that 
third-order methods relate implicitly to condition- 
ing on precision information obtained from the data; 
and the conditioning effectively reduces the dimen- 
sion of the active variable to the dimension of the 
parameter; for some recent discussion see Casella, 
DiCiccio and Wells (1995) and Fraser (2004). 

In a paper presented at the International Work- 
shop on Objective Bayesian Methodology at the Uni- 
versity of Valencia on June 13, 1999, one of the 
present authors discussed strong matching, defined 
to be the effective equivalence of the Bayesian s- 
value and the frequentist p-value. The issue in the 
Bayesian context of having the choice of prior also 
refiect conditioning on precision information pro- 
vided by the model and data was mentioned by the 
presenter and independently by the discussant T. 
Severini of Northwestern University. The issue cen- 
tered on a model with scalar parameter 6 and a data 
precision indicator a such that the actual measure- 
ment of 9 was made by the submodel /i (?/; ^) if a = 1 
and by submodel /2(y; 6) ii a = 2; the data indicator 



14 



M. BEDARD, D. A. S. ERASER AND A. WONG 



had a fixed distribution equivalent to the toss of a 
fair coin. This random choice of measurement model 
was proposed by Cox (1958). 

Suppose that the model fi{y',0) has information 
ii{9) and the model /2(y;^) has information 12(0): 
the information for the composite model is then 
i{6) = {ii{9) + 12(0)} /2. The Jeffreys prior gives the 
posterior 

(38) p{e\y,a) = cf{y,a;e)i^/\e), 

and it has a second-order location relationship with 
a reexpressed 9 (Welch and Peers, 1963). This 
Bayesian posterior distribution is of course condi- 
tional on the observed data, but the indicated choice 
of prior does not reflect the information that the 
data has identified the model type that actually 
made the measurement. If the corresponding infor- 
mation is used to assist the determination of the 
prior, then the posterior, with reference to Jeffreys, 
would be 



(39) 



p{e\y,a) = cf{y,a;9)il/\9), 



where a has its observed value; see also Fraser (2004). 

Clearly (38) and (39) differ whenever ii{9) differs 
from i2{9)- Should the default Bayesian allow the 
default or invariant prior to depend on information 
provided by the data? 

At the workshop there was some acknowledgment 
of a place for such precision information in the choice 
of default prior. If such conditioning is accepted 
among Bayesians and frequentists, then agreement 
to third order is possible: the p-values are equal 
to the s-values and the professional disagreement 
would seem to vanish. What then seems clear in 
the context of continuous parameters and modest 
regularity, is that the frequentist p-values and the 
Bayesian s-values become equal if the frequentist 
accepts conditioning on observed precision informa- 
tion and the Bayesian suitably targets his default 
prior, responding to similar information. 

10. BAYESIAN POSTERIOR MEANS AND 
VARIANCES 

We have been discussing the use of model preci- 
sion information for the choice of a default prior. 
Now suppose we take a prior for convenience or ex- 
pediency or otherwise, and wish to obtain some gen- 
eral posterior characteristics such as posterior means 
and variances. Means and standard deviations can 
often provide a convenient summary for purposes of 



inference, both frequentist and Bayesian. The data- 
accretion techniques apply to likelihood of course, 
and consequently to weighted likelihood as given 
by a posterior. Accordingly we discuss briefly how 
these large-sample-type techniques can help in the 
Bayesian context. 

Consider a component scalar parameter 1^(9) and 
suppose we want just its mean and variance 



M = E^{9) = J c'^{9)L{9)7T{9)d9, 
V = E{i){9) - M}^ 

c{i;{9)-M}^L{9)7T{9)d9, 



where c here is the norming constant for the pos- 
terior distribution. We have of course the option of 
extensive McMC simulations. We first, however, ex- 
amine higher-order likelihood-based methods that 
can be applied or adapted to this purpose. 

From Section 2 and assuming we have a conve- 
nient nuisance parameterization A, we obtain 



(40) /(^A) = ce"''^ 



lie 



1/2 



vr 



as the third-order posterior density approximation 
when renormalized, and 



(41) 



F(V^) = l-$ 



log — 



as the third-order posterior distribution function ap- 
proximation; the signed likelihood root is given 
by (10) in Section 3 and the adjusted maximum like- 
lihood departure by (20) in Section 4. With third- 
order accuracy for / and F we have third-order ac- 
curacy available in principle for obtaining the means 
and variance. A generating-type function to accom- 
plish this would be appealing but seems inaccessible. 
Using the distribution function directly, however, we 
do have the following reexpressions: 

roo 

(42) E{i;)= {l-F{^p)-F{-^l^)}d^|;, 
Jo 



(43) E{ij')= / {1 -F(V') + F(-V')}2V'(iV- 
Jo 

As part of the usual computation of quantities such 
as the distribution function F{tp), a familiar numer- 
ical practice is to evaluate the quantity at equally 
spaced points, say 

. . . , -00 - 2(5, -(Ao - (5, V'o, V'o + (5, iiQ + 25,... 
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taken about some convenient central value ■00 using 
a small value for 6. Let 

■ • ■ , F~2, F-i, Fo, Fi, F2, ■ ■ ■ 

designate such distribution function values, which 
can conveniently be stored in a file. We then have 
that 

E{^-^o) = 5{--- + F^2 + F^i 

+ (1-Fi) +2(1 -F2) + •••}, 
Eii^ - ^0? = 2<^H- • • + + 1 • F-i 

+ 1-(1-Fi) + 2(1-F2) + ---} 

are available immediately by cumulative sums 
through the stored file; we then directly obtain E{'il:) 
and V{iIj). 

For our regression example as discussed in Sec- 
tions 2 and 3, we have used a conventional default 
prior for the Bayesian considerations in Section 4. 
The corresponding posterior survivor function for (3 
was recorded as Figure 1 in Section 3. We processed 
the related file by the above summation formulas 
allowing for the fact that F{[3) = 1 — sb{(3) and ob- 
tained the values in Table 4, column I. 

We also differenced the distribution function val- 
ues to get density values and used them for ordinary 
numerical integration to obtain the mean and vari- 
ance; the results are recorded in column II. 
As a more direct numerical approach, we used the 
estimated density / as given by (40) to obtain an 
alternate value by ordinary numerical integration; 
the results are recorded in column III. 

Finally we used the Markov chain Monte Carlo 
methods as described in Section 7 to simulate these 
posterior means and variances; the simulation size 
was N = 4,000,000. The results for the normal sam- 
pling proposal are recorded in column IV of Table 4; 
the results for the uniform sampling proposal were 



Table 4 

Third-order posterior mean, variance and standard deviation 
by method I, II, III; validation by McMC with N = 4.10'^ 



Computation method 


I 


II 


III 


McMC 


Mean = E{l3) 


0.67642 


0.67639 


0.67208 


0.67172 


Simulation SD 








(0.0^550) 


Variance = V{l3) 


0.08096 


0.08101 


0.08615 


0.08436 


Simulation SD 








(0.0^665) 


SD = SD{l3) 


0.28453 


0.28463 


0.29352 


0.28642 


Simulation SD 








(0.0^763) 



very close. We note the high accuracy of the third- 
order procedures relative to the simulated exact, but 
do not here attempt a detailed comparison of I, II, 
III. 

11. SOME THOUGHTS ON McMC 

Consider a target density g{y) that we wish to 
sample from: for the example the target g{y) is given 
by (14) for the Bayesian approach and by (36) for 
the conditional frequentist case. In both cases the 
g{y) is unnormed; it is a relative density function. 
The sampling difficulty for simulations is typically 
due to the fact that the target density is not a prod- 
uct of independent variables with the related ease of 
sampling coordinate by coordinate. For notation we 
will assume that g{y) is normed, but this will not 
be used other than to facilitate the discussion. 

In this section we briefly discuss the McMC method- 
ology from a statistical rather than probabilistic point 
of view and accordingly use notation that is more 
statistical. This seems particularly appropriate in 
the present context of comparing statistical infer- 
ence from the Bayesian and frequentist approaches: 
both give unnormed density functions that are con- 
ditional. The large sample techniques give highly 
accurate third-order results; these can be assessed 
by McMC and improvement can be obtained by in- 
creasing the simulation size A^. 

The theme behind the Markov chain Monte Carlo 
procedure is to use an accessible density function, 
say f{x\y), to produce a possible value x for the 
next value in a sample sequence, based of course on 
the most recent value, say y. This accessible den- 
sity is typically taken to have an amenable product 
form with independent coordinates. In the McMC 
sampling process with values y^, . . . , in hand, we 
sample from the proposal density /(x|y") to obtain 
a candidate x for the next sample value. This can- 
didate will either be accepted with an acceptance 
probability A{x\y^) with the result that is set 
equal to x, or be rejected with complementary prob- 
ability 1 — A{x\y"') with the result that y^~^^ is set 
equal to y" which is a repeat of the value in hand. 
The acceptance probability is often taken to be a 
Metropolis-Hastings ratio (Metropolis, 1953; Hast- 
ings, 1970), now to be described. 

For discussion we let g{x) and f{x\y) designate 
probabilities of being in a neighborhood of a point 
X. Of course we should properly write g{x)A and 
f{x\y)A, where A is a small volume element at the 
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point X, but all the A's will cancel and expressions 
are easier without them. 

Consider two sample space points a and 6, and 
how a next sample point might be a transition or 
a repeat among these points. If we are at a first 
time point with data value a and sample from the 
proposal f{x\a) when the target to sample from is 
g{x)^ then the likelihood ratio 



L{h) 



f{b\a) 



records how things ideally should be scaled to agree 
with the target at b. Alternatively, if we are at the 
same first time point with data value b and sample 
from the proposal f{x\b) when the target to sample 
from is g{x), then the likelihood ratio 

records how things ideally should be scaled to agree 
with the target at a. The ratio of these likelihoods 
is called the Metropolis-Hastings ratio, 



MH{b\a) 



m 

L{a) 

g{b)/f{b\a) g{b)f{a\b) 



g{a)/f{a\b) g{a)f{b\ay 

it gives us the mechanism to adjust the transition 
from a to 6 to give conformity to the target density 
g{y)] its reciprocal addresses the transition from b to 
a. Accordingly, the acceptance probability for going 
from a to 6 is taken to be MH{b\a) but capped at 
the maximum 1 possible for a probability: 



A{b\a) = MH{b\a), 

where the bar indicates the capping: A = min(^, 1). 
The capping can of course give a shortfall in tran- 
sitions from a to 5 but we see that the related re- 
jection and repeat of the preceding value precisely 
compensates. 

Consider the typical case where the proposal f{x\y) 
does not duplicate the target g{x). And without loss 
of generality, consider a pair of points a, b where 
L{a) > L{b) or 



MH{b\a) = A{b\a) 



9{b)f{a\b) 



< 1. 



g{a)f{b\a) 

Suppose the probabilities for the sampling process 
are correct at some possible point in time, that is, 
they are equal to g[a) and g{b) corresponding to a 



and b for that time point. Then suppose we consider 
transitions within the pair {a, 6}. Concerning a tran- 
sition going from a to 6, the probability of being at 
a and going to b is 

g{a)f{b\a)A{b\a)=g{b)f{a\b)- 

this represents a probability increase at b and prob- 
ability loss at a. Concerning a transition from b to 
a, while noting that A{a\b) = 1, the probability of 
being at b and going to a is 

g{b)f{a\b)A{a\b)=g{b)f{a\b)- 

this represents a probability increase at a and prob- 
ability loss at b. We note that the two probability 
movements cancel each other and thus the probabil- 
ities at a and b are maintained; we do note, however, 
that the rejection probability 1 — A{b\a) represents 
a loss of new sampling information. One can thus 
view the acceptance probability as an effective ad- 
justment of the proposal f{x\y) to yield proper tran- 
sitions between pairs of points so as to accomplish 
what is prescribed by the target g{y). 

12. ASYMPTOTIC McMC 

For our example, the large sample likelihood-based 
methods gave us a Bayesian analysis in Section 4 and 
a frequentist analysis in Section 6. In both cases, we 
used Markov chain Monte Carlo methods for vali- 
dation: in the first case we sampled the unnormed 
posterior tt{0)L{9) given by (14) which is a condi- 
tional distribution given the data; in the second case 
we sampled an unnormed sample space conditional 
distribution given by (36) which is conditional on 
observed precision information. Of course the ex- 
ample is sufficiently low dimensional that numerical 
integration could have been used, but McMC is eas- 
ier to implement and readily extends to larger and 
more complicated sample spaces and target densi- 
ties g{y). We now examine how we can make use 
of the asymptotic form of the target distribution to 
give a more efficient version of the proposal distri- 
bution. For our example, the Metropolis-Hastings 
acceptance rates were approximately 38% for the 
normal proposal and 25% for the uniform proposal, 
and both yielded reasonable convergence rates. We 
now investigate ways to smartly increase this accep- 
tance rate, by introducing a proposal density that 
generates wiser moves and thus improves the preci- 
sion of the McMC sampling process. 

For our asymptotic context we have that the un- 
normed density g{y) has a maximum density value 
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at a point which we designate as y, and has a neg- 
ative Hessian designated as j = — log g{y)/dy dy' 
as evahiated at the maximum y. We now assume 
that the variable y has dimension, say, d, and inves- 
tigate the choice of an expedient proposal f{x\y). 
In Section 7, we used a proposal that was a prod- 
uct of independent normals and another that was a 
product of independent uniforms: 



(44) 



where h is either normal or uniform centered at y", 
with scaling chosen pragmatically; in the present 
case X and y have dimension d with coordinates 
xi, . . . ,Xii and yi, . . . ,yd- Of course a proposal that 
mimics the target g{y) would have advantages in ef- 
ficiency, giving candidate sample values that tend to 
agree with the target g{y) and thus have less loss. 

A simple choice for the proposal f{x\y) is the 
N{y;j~^) distribution which does not depend on 
the preceding value. Such a multivariate normal is 
available in many computing packages for random 
sampling and has here both the same point of max- 
imum value and the same local scaling as the target 
distribution. The normal, however, has short tails 
and thus in sampling would often neglect the tails 
of a target distribution g{y), with loss of efficiency 
perhaps serious in some contexts. 

A more refined choice is the multivariate Student 
distribution with degrees of freedom chosen prag- 
matically to give longer tails (see Brazzale, 2000); 
this proposal is purely for the McMC simulations 
and does not relate to the objective of interest, the 
p-value, although it does affect the McMC simu- 
lations, as we will see. A canonical version of this 
Student distribution with degrees of freedom, say, / 
is designated Student/ (0; /) and has density 

r((/ + d)/2) 



h{T) 



(45) 



7r'^/2r(//2) 

(i + r2 + ... + r|)-(/+'^)/2; 



sample values for this can be obtained as 



T': 



Zl_ Zd_ 

where the Zi are independent standard normal and 
Xf is a chi variable with / degrees of freedom, both 
easily accessible in computer packages. The nega- 
tive Hessian of the canonical log-density from (45) 
is (/ + d)I and for use in the present context would 



need to be adjusted by scaling and also of course 
by location to give the desired location and Hes- 
sian to match the target. Accordingly, we take the 
modified proposal f{x\y) replacing (44) to be the 
Student/{y, (/ + d)j~^} with values available as 



(46) 



x = y + {f + d)'/'r'/'T 

= y + W/xf, 



where T designates a vector from the canonical 
Student/ (0, 1), w designates a value from the mul- 
tivariate normal MV{0,j~^), W designates a value 
from the MA^(0, (/ + d)j-^) and j^/^ ^ suitable 
square root matrix of j. A pragmatic choice for the 
degrees of freedom / could allow for thicker tails and 
provide improved sampling coverage of extremes. 
For our example we simplistically chose the degrees 
of freedom / to be the degrees of freedom 7 that 
was used originally to generate the individual coor- 
dinates. 

We then applied the McMC procedure using the 
Metropolis-Hastings ratio and sampled N = 
4,000,000 times using the dump 50, keep 950 pro- 
cedure as described earlier; we then calculated the 
proportion of values with 

t< -1.445634, 

or equivalently with 

6/s< -0.122178. 

We obtained the p- value p = 0.10765 with simulation 
SD = 0.000196, along with an acceptance rate of 
76%; see Table 3. Values are also recorded for testing 
/3 = 1.5 and (3 = 2. Our view verified so far is that 
the more the proposal mimics the target, the higher 
the acceptance rate will be. To obtain high accuracy, 
very large values of are needed so any increase in 
efficiency has merit. We discuss this briefiy in the 
final discussion section. 

13. ADAPTIVE MCMC 

The use of a RWM sampling proposal f{x\y) as 
in Section 7 is in its nature adaptive, as it samples 
near the most recent sample value. We modify this 
adaptive procedure by having the proposal mimic 
the target g{-), that is, by having the same shape at 
the maximum and the same drop-off to the current 
value y in hand. We do this by centering and shaping 
the proposal as in the preceding section, but also by 
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determining the degrees of freedom / to duplicate 
the drop-off 



(47) 



9{y) 
aiy) 



1 + 



(y_yyj(y_y)|-(/+<i)/2 

f + d 



from the maximum to the current point y; we then 
have that the Student proposal has the same maxi- 
mizing value and has the same Hessian as the target, 
but now also has the degrees of freedom to duplicate 
the tail thickness at the current point in hand. For 
pragmatic reasons we take / to be the nearest in- 
teger to the solution of (47) but restrict it to the 
range from, say, the Cauchy with / = 1 to the near 
normal with / = 50. If we let = 2log{g{y) / g{y)} 
be the target likelihood ratio quantity and = 
{y — y)' j{y — y) be the quadratic departure for the 
Student, we can solve for f + d = f using 



(48) 



/log 1 + 



2 



by a simple scan of integer values for / in {1,2,..., 
50}. 

This adaptive McMC then proceeds as follows: if 
the ith sample value y^ = y, we solve for an integer 
/ and then / using (48) and (47) and obtain a trial 
value X for the next observation by sampling from 
f{x\y) taken to be the Student j {y, {f + d)j~^) dis- 
tribution using one of the data generation methods 
in (46). 

For the example, we now apply this adaptive pro- 
cedure to the conditional distribution (36) in Sec- 
tion 7 and examine the proportion of values with 

t< -1.445634 

or equivalently with 

b/s < -0.122178. 

With N = 4,000,000 and using the dump 50, keep 
950 procedure as before we obtain p = 0.10792 with 
SD = 0.000204, along with an acceptance rate of 
82%, substantially more than with preceding meth- 
ods. This is recorded in Table 3 together with values 
for assessing /? = 1.5 and (3 = 2. 

14. CONTROVERSIAL EXAMPLE: 
BEHRENS-FISHER 

(with Ye Sun, York University) 

In a recent study of controversial examples in statis- 
tics (Fraser, Wong and Sun, 2007), extensive simula- 
tions were performed on some recent procedures for 



the Behrens (1929)-Fisher (1935) statistical prob- 
lem. This problem concerns a sample of ni from a 
Normal(/ii , ai) and a sample of n2 from a Normal(//2) 
(72 ) and addresses inference for the difference 6 = 
fii — fj,2 of the population means. The statistical 
model is simple, just two normals, but clearcut pro- 
cedures for inference have been elusive. 

Fisher (1935), following Behrens (1929), suggested 
that the confidence distribution for /ii be convolved 
with the confidence distribution for ij,2 to target the 
difference 5 = — ^2- This combining of confidence 
distributions ran contrary to statistical practice at 
the time and evoked an extensive literature response 
which we do not explore here. 

Jeffreys (1961) recommended the use of the prior 

(yi'^(y2^ dfii d(Ti dfi2 da2, 

which is the combination of the right invariant pri- 
ors for the two normal models. Such right invariant 
priors are common priors for default Bayesian analy- 
sis; also the right invariant prior for a normal model 
can be seen to reproduce Fisher's confidence distri- 
bution for the corresponding mean. 

Ghosh and Kim (2001) proposed a second-order 
default prior 

o'Y^o'2^ (cf /m + a\/n)d^ti d^2 dai da2 , 

which has somewhat the form of a weighted average 
of the two component right invariant priors. 

The signed likelihood ratio (10) examined in Sec- 
tion 3 can provide first-order p- values and confidence 
intervals. The /3-level confidence interval for the dif- 
ference 5 in means has the form 

{^■Z-a/2<n < Za/2), 

where (^;-q/2i -^0/2) is a /3 = 1 — q interval for the 
standard normal and rs is the signed likelihood ratio 
(10) for assessing 5. 

The third-order likelihood methods in Section 6 
use the signed likelihood ratio rs together with the 
maximum likelihood departure qs formula (31), and 
then combine them using Barndorff-Nielsen's (1991) 
formula (30) to obtain an r| for assessing the 
Behrens-Fisher 6. The corresponding (3-level con- 
fidence interval is 

{6:z_a/2<rs < Za/2). 

These methods were compared in Fraser, Wong and 
Sun (2007) using a simulation size of = 10,000. 
The third-order methods generally performed well, 
especially with increasing sample size. 
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Table 5 

For a simulation size of N — 10,000,000, the table records the percentage of simulation cases where the true value was left or 

right of the confidence interval 



Method 




99% CI 




95% CI 




90% CI 


Outside left 


Outside right 


Outside left 


Outside right 


Outside left 


Outside right 


Target value 


0.50% 


0.50% 


2.50% 


2.50% 


5.00% 


5.00% 


Jeffreys 


0.009% 


0.010% 


0.245% 


0.245% 


0.958% 


0.960% 


Ghosh and Kim 


0.022% 


0.023% 


0.543% 


0.545% 


2.027% 


2.028% 


Likelihood ratio 


3.884% 


4.421% 


9.718% 


9.247% 


13.597% 


14.142% 


Third order 


0.402% 


0.401% 


2.021% 


2.023% 


4.045% 


4.043% 


(2 Sr> limits) 


(±0.002%) 


(±0.002%) 


(±0.005%) 


(±0.005%) 


(±0.007%) 


(±0.007%) 



The target value is the corresponding confidence value; the methods are the Bayesian Jeffreys and Ghosh and Kim and the 
frequentist likelihood ratio and the third order. 



For presentation here we chose the smahest pos- 
sible sample sizes ni = n2 = 2 and the equal vari- 
ance case and increased the simulation size to = 
10,000,000. Then for central confidence intervals at 
levels 99%, 95%, 90% we calculated the percentage 
of cases with true parameter value on the left side 
and on the right side of the confidence interval; the 
results are recorded in Table 5, where we also record 
the estimated simulation limits. 

The results address a most extreme case of the 
Behrens-Fisher problem: samples of size ni = n2 = 
2. The third-order performance seems reasonably 
close to the target. It does, however, deviate by more 
than the simulation limits would suggest; but it does 
represent a substantial improvement over available 
procedures. 

15. DISCUSSION 

We have surveyed inference procedures for obtain- 
ing frequentist p-values and Bayesian posterior sur- 
vivor s-values, as well as the corresponding con- 
fidence intervals and posterior intervals. Our em- 
phasis has been on the use of higher-order likeli- 
hood methods to obtain increased accuracy and we 
have verified the increased accuracy with extensive 
McMC simulations. 

To motivate the presentation of the procedures 
we have used a very simple linear model but with 
nonnormal errors. The example does have an appro- 
priate default prior so the frequentist and Bayesian 
methods are comparable. 

For a more complex example we have reported 
on extensive simulations for the most extreme case 
of the Behrens-Fisher problem, an example that is 



simple in the sense of involving only normal sam- 
ples but complex in its long-standing history of de- 
fying both frequentist and Bayesian theoretical ap- 
proaches. The higher-order methods lead to values 
that quite accurately assess the difference in means, 
the typical parameter of interest, and from simula- 
tions outperform available Bayesian methods. 

We have also examined McMC methods from a 
statistical viewpoint and illustrated them by exten- 
sive assessments of higher-order likelihood methods. 

In brief we have found that higher-order likelihood 
using MLEs and observed information can yield the 
precision of 4 million simulation steps given a suit- 
able statistic. In addition they provide focused accu- 
racy by precisely separating information on almost 
any scalar parameter chosen as of interest. Various 
examples illustrating the theory are also included 
with the references. 

APPENDIX 

(i) Regression conditional distribution. For the 

regression model y = X(3 + az with error density 
= nr=i9(-^i)) can examine how parameter 
change affects the n coordinates yi and how conti- 
nuity determines a conditional distribution having 
dimension equal to that of the parameter. Conve- 
nient coordinates (6, s) corresponding to (/3, a) are 
available from least squares or maximum likelihood 
(see, e.g., Fraser, 1979, 2004); in either case we have 

h{y)=(5^ah{z), 

s{y) =as{z), 

and then have the standardized residual vector 

d{y) = s~Hy){y - Xb{y)} 

= s-\z){z-Xb{z)} = d{z). 
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It follows with observed data ip that d{z) = d{'ip) 
which then implies that the appropriate model 
should be conditional. Routine calculations (Fraser, 
1979) then give the null distribution 

n 

g{b, s)db ds = c\{g{Xib + sd^i)s''-'"-^ dbds, 
1=1 

where Xi is the ith row of X and d^ is the ith el- 
ement of the observed standardized residual d{z) = 
d{y^); the nonnull distribution for {b{y), s{y)} is then 



evaluating 



g{b, s; /?, a) db ds 

n 

= cl[g[a~^{Xi{b-P) + sd^i}] 



i=l 



„n— r— 1 



a' 



db ds. 



This can be rewritten directly in terms of the ob- 
served likelihood L^{f3,a;y^) =cf{y^;P,a) as 



db ds 



where 



g{b,s;P,a) dbds = L^{f3^,a^) 



/?* = 6° + -(/3-&), 
s 



cr* = — a. 

s 

(ii) Simulation standard deviation. In a Bernoulli 
sequence the observed proposition p has a standard 
deviation {pq/Ny^'^ , which is bounded by I/IN^^"^ . 
An McMC sequence will typically have serial cor- 
relations; accordingly we worked in batches of -B = 
1000 and dropped the first 50 and retained the re- 
maining 950 in each batch. We tested and found the 
sequence of Nb = 4,000,000/S = 4000 batch means 
to be essentially free of correlation. We calculated 
the usual standard deviation s of the batch means 

1 /2 

and then obtained an upper bound estimate s/N^ 
for the standard deviation of the overall mean. This 
can be fine-tuned for probabilities away from 1/2 by 
using p in place of 1/2 in the usual binomial variance 
formula. 

(iii) The surrogate for 'ip{6). The rotated ip coor- 
dinate is obtained using a coefficient vector a applied 
to the vector, 



(49) 



\%'{e^)\ 

the row vector a' multiplying '^{9) is the unit vector 
version of the gradient '4jipi{9^) and is obtained by 



89' V 06' ) 

at 6^, and then normalizing; this gives a unit vector 
perpendicular in the ip coordinates to ip{6{ip)} at 
(^^. The use of the unit vector in (49) produces a 
rotated coordinate of p>{9) that agrees with ^^(0) at 
9^ in the sense of being first derivative equivalent to 
i^{9) at the point 9^. 

(iv) Information determinants. The information 
determinants are recalibrated to the parameteri- 
zation 



(50) 



\ji\\)0^(>)\ 



\jee\\^e{9)\ ^ 

\jxxi9^,)\\px'i9^,T^ 

\jxx{9^)\\X\-^, 



where the right-hand px {p — 1) determinant |^| = 
uses X = (px'{9^) and in the regression 
context records the volume on the regression sur- 
face as a proportion of volume for the regression 
coefficients. 
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